17 research outputs found

    ElixirNet: Relation-aware Network Architecture Adaptation for Medical Lesion Detection

    Full text link
    Most advances in medical lesion detection network are limited to subtle modification on the conventional detection network designed for natural images. However, there exists a vast domain gap between medical images and natural images where the medical image detection often suffers from several domain-specific challenges, such as high lesion/background similarity, dominant tiny lesions, and severe class imbalance. Is a hand-crafted detection network tailored for natural image undoubtedly good enough over a discrepant medical lesion domain? Is there more powerful operations, filters, and sub-networks that better fit the medical lesion detection problem to be discovered? In this paper, we introduce a novel ElixirNet that includes three components: 1) TruncatedRPN balances positive and negative data for false positive reduction; 2) Auto-lesion Block is automatically customized for medical images to incorporate relation-aware operations among region proposals, and leads to more suitable and efficient classification and localization. 3) Relation transfer module incorporates the semantic relationship and transfers the relevant contextual information with an interpretable the graph thus alleviates the problem of lack of annotations for all types of lesions. Experiments on DeepLesion and Kits19 prove the effectiveness of ElixirNet, achieving improvement of both sensitivity and precision over FPN with fewer parameters.Comment: 7 pages, 5 figure, AAAI202

    CLIP2^2: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data

    Full text link
    Contrastive Language-Image Pre-training, benefiting from large-scale unlabeled text-image pairs, has demonstrated great performance in open-world vision understanding tasks. However, due to the limited Text-3D data pairs, adapting the success of 2D Vision-Language Models (VLM) to the 3D space remains an open problem. Existing works that leverage VLM for 3D understanding generally resort to constructing intermediate 2D representations for the 3D data, but at the cost of losing 3D geometry information. To take a step toward open-world 3D vision understanding, we propose Contrastive Language-Image-Point Cloud Pretraining (CLIP2^2) to directly learn the transferable 3D point cloud representation in realistic scenarios with a novel proxy alignment mechanism. Specifically, we exploit naturally-existed correspondences in 2D and 3D scenarios, and build well-aligned and instance-based text-image-point proxies from those complex scenarios. On top of that, we propose a cross-modal contrastive objective to learn semantic and instance-level aligned point cloud representation. Experimental results on both indoor and outdoor scenarios show that our learned 3D representation has great transfer ability in downstream tasks, including zero-shot and few-shot 3D recognition, which boosts the state-of-the-art methods by large margins. Furthermore, we provide analyses of the capability of different representations in real scenarios and present the optional ensemble scheme.Comment: To appear at CVPR 202

    Co-Creating for Locality and Sustainability: Design-Driven Community Regeneration Strategy in Shanghai’s Old Residential Context

    No full text
    Community regeneration has drawn much attention in both the urban development and sustainable design fields in the last decade. As a response to the regeneration challenges of Shanghai’s old and high-density communities, this article proposes two design-driven strategies: enabling residents to become innovation protagonists and facilitating collaborative entrepreneurial clusters based on the reorganization of community resources. Two ongoing collaborative projects between the Siping community and Tongji University—Open Your Space microregeneration (OYS) and the Neighborhood of Innovation, Creativity, and Entrepreneurship Towards 2035 (NICE 2035) living labs project—are adopted as main case studies. Research findings are put forward through a structured analysis of qualitative data. Firstly, we reviewed the situation and sustainable goals for Shanghai’s old residential communities, and how design-centric social innovation and collaboration can be effective interventions. Secondly, we analyzed resident empowerment approaches to decision-making, co-design, and co-management processes in OYS with participatory observation. Finally, through participants’ interviews and key events analysis in NICE 2035, we investigated how living labs reuse community distributed resources to develop lifestyle-based business prototypes. The inquiry of this article proposes a co-creation mechanism and action guides towards localized and sustainable community regeneration, which can provide a contextual paradigm for similar challenges

    3D Human Pose Machines with Self-supervised Learning

    No full text

    How to Save your Annotation Cost for Panoptic Segmentation?

    No full text
    How to properly reduce the annotation cost for panoptic segmentation? How to leverage and optimize the cost-quality trade-off for training data and model? These questions are key challenges towards a label-efficient and scalable panoptic segmentation system due to its expensive instance/semantic pixel-level annotation requirements. By closely examining different kinds of cheaper labels, we introduce a novel multi-objective framework to automatically determine the allocation of different annotations, so as to reach a better segmentation quality with a lower annotation cost. Specifically, we design a Cost-Quality Balanced Network (CQB-Net) to generate the panoptic segmentation map, which distills the crucial relations between various supervisions including panoptic labels, image-level classification labels, bounding boxes, and the semantic coherence information between the foreground and background. Instead of ad-hoc allocation during training, we formulate the optimization of cost-quality trade-off as a Multi-Objective Optimization Problem (MOOP). We model the marginal quality improvement of each annotation and approximate the Pareto-front to enable a label-efficient allocation ratio. Extensive experiments on COCO benchmark show the superiority of our method, e.g. achieving a segmentation quality of 43.4% compared to 43.0% of OCFusion while saving 2.4x annotation cost
    corecore